Overview

Dataset statistics

Number of variables9
Number of observations767
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.0 KiB
Average record size in memory72.1 B

Variable types

Numeric8
Categorical1

Warnings

Pregnancies has 111 (14.5%) zeros Zeros

Reproduction

Analysis started2021-04-20 03:49:56.082605
Analysis finished2021-04-20 03:50:08.881058
Duration12.8 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

Pregnancies
Real number (ℝ≥0)

ZEROS

Distinct17
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.848761408
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Memory size6.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.370207423
Coefficient of variation (CV)0.8756602619
Kurtosis0.1563476434
Mean3.848761408
Median Absolute Deviation (MAD)2
Skewness0.8998253832
Sum2952
Variance11.35829807
MonotocityNot monotonic
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1134
17.5%
0111
14.5%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
5.0%
928
 
3.7%
Other values (7)58
7.6%
ValueCountFrequency (%)
0111
14.5%
1134
17.5%
2103
13.4%
375
9.8%
468
8.9%
ValueCountFrequency (%)
171
 
0.1%
151
 
0.1%
142
 
0.3%
1310
1.3%
129
1.2%

Glucose
Real number (ℝ≥0)

Distinct136
Distinct (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121.7192366
Minimum44
Maximum199
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum44
5-th percentile80
Q1100
median117
Q3140.5
95-th percentile181
Maximum199
Range155
Interquartile range (IQR)40.5

Descriptive statistics

Standard deviation30.43821062
Coefficient of variation (CV)0.2500690232
Kurtosis-0.2596951528
Mean121.7192366
Median Absolute Deviation (MAD)20
Skewness0.5311836305
Sum93358.6545
Variance926.4846655
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10017
 
2.2%
9917
 
2.2%
11114
 
1.8%
12914
 
1.8%
12514
 
1.8%
10614
 
1.8%
11213
 
1.7%
9513
 
1.7%
10213
 
1.7%
10813
 
1.7%
Other values (126)625
81.5%
ValueCountFrequency (%)
441
0.1%
561
0.1%
572
0.3%
611
0.1%
621
0.1%
ValueCountFrequency (%)
1991
 
0.1%
1981
 
0.1%
1974
0.5%
1963
0.4%
1952
0.3%

BloodPressure
Real number (ℝ≥0)

Distinct47
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.25769307
Minimum24
Maximum122
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum24
5-th percentile52
Q164
median72
Q380
95-th percentile90
Maximum122
Range98
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.12357736
Coefficient of variation (CV)0.1677825135
Kurtosis1.074110209
Mean72.25769307
Median Absolute Deviation (MAD)8
Skewness0.1722404697
Sum55421.65059
Variance146.981128
MonotocityNot monotonic
Histogram with fixed size bins (bins=47)
ValueCountFrequency (%)
7056
 
7.3%
7452
 
6.8%
6845
 
5.9%
7845
 
5.9%
7244
 
5.7%
6443
 
5.6%
8040
 
5.2%
7639
 
5.1%
6037
 
4.8%
69.1043024835
 
4.6%
Other values (37)331
43.2%
ValueCountFrequency (%)
241
 
0.1%
302
0.3%
381
 
0.1%
401
 
0.1%
444
0.5%
ValueCountFrequency (%)
1221
 
0.1%
1141
 
0.1%
1103
0.4%
1082
0.3%
1063
0.4%

SkinThickness
Real number (ℝ≥0)

Distinct51
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.59671352
Minimum7
Maximum99
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum7
5-th percentile14.3
Q120.52281617
median23
Q332
95-th percentile44
Maximum99
Range92
Interquartile range (IQR)11.47718383

Descriptive statistics

Standard deviation9.638762095
Coefficient of variation (CV)0.362404253
Kurtosis3.897513402
Mean26.59671352
Median Absolute Deviation (MAD)5
Skewness1.22782446
Sum20399.67927
Variance92.90573472
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20.52281617227
29.6%
3231
 
4.0%
3027
 
3.5%
2723
 
3.0%
2322
 
2.9%
3320
 
2.6%
1820
 
2.6%
2820
 
2.6%
3118
 
2.3%
1918
 
2.3%
Other values (41)341
44.5%
ValueCountFrequency (%)
72
 
0.3%
82
 
0.3%
105
0.7%
116
0.8%
127
0.9%
ValueCountFrequency (%)
991
0.1%
631
0.1%
601
0.1%
561
0.1%
542
0.3%

Insulin
Real number (ℝ≥0)

Distinct186
Distinct (%)24.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean118.7614251
Minimum14
Maximum846
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum14
5-th percentile50
Q179.90352021
median79.90352021
Q3127.5
95-th percentile293
Maximum846
Range832
Interquartile range (IQR)47.59647979

Descriptive statistics

Standard deviation93.10934221
Coefficient of variation (CV)0.7840032413
Kurtosis14.1282867
Mean118.7614251
Median Absolute Deviation (MAD)3.096479791
Skewness3.290164681
Sum91090.01304
Variance8669.349607
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
79.90352021373
48.6%
10511
 
1.4%
1309
 
1.2%
1409
 
1.2%
1208
 
1.0%
1807
 
0.9%
1007
 
0.9%
947
 
0.9%
1156
 
0.8%
1106
 
0.8%
Other values (176)324
42.2%
ValueCountFrequency (%)
141
0.1%
151
0.1%
161
0.1%
182
0.3%
221
0.1%
ValueCountFrequency (%)
8461
0.1%
7441
0.1%
6801
0.1%
6001
0.1%
5791
0.1%

BMI
Real number (ℝ≥0)

Distinct248
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.45350873
Minimum18.2
Maximum67.1
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum18.2
5-th percentile22.23
Q127.5
median32
Q336.6
95-th percentile44.41
Maximum67.1
Range48.9
Interquartile range (IQR)9.1

Descriptive statistics

Standard deviation6.879458326
Coefficient of variation (CV)0.2119788767
Kurtosis0.9161725942
Mean32.45350873
Median Absolute Deviation (MAD)4.5
Skewness0.5996711105
Sum24891.8412
Variance47.32694686
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3213
 
1.7%
31.612
 
1.6%
31.212
 
1.6%
31.994654511
 
1.4%
33.310
 
1.3%
32.410
 
1.3%
30.19
 
1.2%
30.89
 
1.2%
32.99
 
1.2%
32.89
 
1.2%
Other values (238)663
86.4%
ValueCountFrequency (%)
18.23
0.4%
18.41
 
0.1%
19.11
 
0.1%
19.31
 
0.1%
19.41
 
0.1%
ValueCountFrequency (%)
67.11
0.1%
59.41
0.1%
57.31
0.1%
551
0.1%
53.21
0.1%

DiabetesPedigreeFunction
Real number (ℝ≥0)

Distinct517
Distinct (%)67.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4720808344
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum0.078
5-th percentile0.1403
Q10.2435
median0.374
Q30.6265
95-th percentile1.1333
Maximum2.42
Range2.342
Interquartile range (IQR)0.383

Descriptive statistics

Standard deviation0.3314962775
Coefficient of variation (CV)0.702202363
Kurtosis5.584011879
Mean0.4720808344
Median Absolute Deviation (MAD)0.169
Skewness1.917791098
Sum362.086
Variance0.109889782
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2586
 
0.8%
0.2546
 
0.8%
0.2685
 
0.7%
0.2075
 
0.7%
0.2615
 
0.7%
0.2385
 
0.7%
0.2595
 
0.7%
0.2994
 
0.5%
0.274
 
0.5%
0.6924
 
0.5%
Other values (507)718
93.6%
ValueCountFrequency (%)
0.0781
0.1%
0.0841
0.1%
0.0852
0.3%
0.0882
0.3%
0.0891
0.1%
ValueCountFrequency (%)
2.421
0.1%
2.3291
0.1%
2.2881
0.1%
2.1371
0.1%
1.8931
0.1%

Age
Real number (ℝ≥0)

Distinct52
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.25423729
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76207916
Coefficient of variation (CV)0.3537016669
Kurtosis0.6397407766
Mean33.25423729
Median Absolute Deviation (MAD)7
Skewness1.127991799
Sum25506
Variance138.3465062
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2272
 
9.4%
2163
 
8.2%
2548
 
6.3%
2446
 
6.0%
2337
 
4.8%
2835
 
4.6%
2633
 
4.3%
2732
 
4.2%
2929
 
3.8%
3124
 
3.1%
Other values (42)348
45.4%
ValueCountFrequency (%)
2163
8.2%
2272
9.4%
2337
4.8%
2446
6.0%
2548
6.3%
ValueCountFrequency (%)
811
0.1%
721
0.1%
701
0.1%
692
0.3%
681
0.1%

Outcome
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
0
499 
1
268 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters767
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1
ValueCountFrequency (%)
0499
65.1%
1268
34.9%
Histogram of lengths of the category
ValueCountFrequency (%)
0499
65.1%
1268
34.9%

Most occurring characters

ValueCountFrequency (%)
0499
65.1%
1268
34.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number767
100.0%

Most frequent character per category

ValueCountFrequency (%)
0499
65.1%
1268
34.9%

Most occurring scripts

ValueCountFrequency (%)
Common767
100.0%

Most frequent character per script

ValueCountFrequency (%)
0499
65.1%
1268
34.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII767
100.0%

Most frequent character per block

ValueCountFrequency (%)
0499
65.1%
1268
34.9%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
06148.072.00000035.00000079.9035233.6000000.627501
1185.066.00000029.00000079.9035226.6000000.351310
28183.064.00000020.52281679.9035223.3000000.672321
3189.066.00000023.00000094.0000028.1000000.167210
40137.040.00000035.000000168.0000043.1000002.288331
55116.074.00000020.52281679.9035225.6000000.201300
6378.050.00000032.00000088.0000031.0000000.248261
710115.069.10430220.52281679.9035235.3000000.134290
82197.070.00000045.000000543.0000030.5000000.158531
98125.096.00000020.52281679.9035231.9946540.232541

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
7570123.072.020.52281679.9035236.30.258521
7581106.076.020.52281679.9035237.50.197260
7596190.092.020.52281679.9035235.50.278661
760288.058.026.00000016.0000028.40.766220
7619170.074.031.00000079.9035244.00.403431
762989.062.020.52281679.9035222.50.142330
76310101.076.048.000000180.0000032.90.171630
7642122.070.027.00000079.9035236.80.340270
7655121.072.023.000000112.0000026.20.245300
7661126.060.020.52281679.9035230.10.349471